Geolocation Prediction in Social Media Data by Finding Location Indicative Words
نویسندگان
چکیده
Geolocation prediction is vital to geospatial applications like localised search and local event detection. Predominately, social media geolocation models are based on full text data, including common words with no geospatial dimension (e.g. today) and noisy strings (tmrw), potentially hampering prediction and leading to slower/more memory-intensive models. In this paper, we focus on finding location indicative words (LIWs) via feature selection, and establishing whether the reduced feature set boosts geolocation accuracy. Our results show that an information gain ratiobased approach surpasses other methods at LIW selection, outperforming state-of-the-art geolocation prediction methods by 10.6% in accuracy and reducing the mean and median of prediction error distance by 45km and 209km, respectively, on a public dataset. We further formulate notions of prediction confidence, and demonstrate that performance is even higher in cases where our model is more confident, striking a trade-off between accuracy and coverage. Finally, the identified LIWs reveal regional language differences, which could be potentially useful for lexicographers.
منابع مشابه
Geolocation Prediction in Twitter Using Location Indicative Words and Textual Features
Knowing the location of a social media user and their posts is important for various purposes, such as the recommendation of location-based items/services, and locality detection of crisis/disasters. This paper describes our submission to the shared task “Geolocation Prediction in Twitter” of the 2nd Workshop on Noisy User-generated Text. In this shared task, we propose an algorithm to predict ...
متن کاملImproving the utility of social media with Natural Language Processing
Social media has been an attractive target for many natural language processing (NLP) tasks and applications in recent years. However, the unprecedented volume of data and the non-standard language register cause problems for off-the-shelf NLP tools. This thesis investigates the broad question of how NLP-based text processing can improve the utility (i.e., the effectiveness and efficiency) of s...
متن کاملText-Based Twitter User Geolocation Prediction
Geographical location is vital to geospatial applications like local search and event detection. In this paper, we investigate and improve on the task of text-based geolocation prediction of Twitter users. Previous studies on this topic have typically assumed that geographical references (e.g., gazetteer terms, dialectal words) in a text are indicative of its author’s location. However, these r...
متن کاملGeolocating Blogs from Their Textual Content
Mashups showing the geographic location of the authors of social media content are popular. They generally depend on the authors reporting their own location. For blogs, automated geolocation strategies using IP address and domain name are not adequate for determining an author’s location. Instead, we detail textual geolocation techniques suitable for tagging social media data, facilitating dev...
متن کاملCarmen: A Twitter Geolocation System with Applications to Public Health
Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012